Code Attention: Translating Code to Comments by Exploiting Domain Features
نویسندگان
چکیده
Appropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code. Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if the coding patterns change and hence the performance of comment generation declines. Furthermore, previous datasets are too small to validate the methods and show their advantage. In this paper, we first build C2CGit, a large dataset from open projects in GitHub, which is more than 20× larger than existing datasets. Then we propose a new attention module called Code Attention to translate code to comments, which is able to utilize the domain features of code snippets, such as symbols and identifiers. By focusing on these specific features, Code Attention has the ability to understand the structure of code snippets. Experimental results demonstrate that the proposed module has better performance over existing approaches in both BLEU, METEOR and human evaluation. We also perform ablation studies to determine effects of different parts in Code Attention.
منابع مشابه
Dwarf Frankenstein is still in your memory: tiny code reuse attacks
Code reuse attacks such as return oriented programming and jump oriented programming are the most popular exploitation methods among attackers. A large number of practical and non-practical defenses are proposed that differ in their overhead, the source code requirement, detection rate and implementation dependencies. However, a usual aspect among these methods is consideration of the common be...
متن کاملCode-Copying in the Balochi Language of Sistan
This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...
متن کاملSemantic Clustering: exploiting Linguistic Information
Many approaches have been developed to comprehend software source code, most of them focusing on program structural information. However, in doing so we are missing a crucial information, namely, the domain semantics information contained in the text or symbols of the source code. When we are to understand software as a whole, we need to enrich these approaches with conceptual insights gained f...
متن کاملComparing auditory sustained attention in children with auditory processing disorder and normal children
Introduction: Auditory processing disorder (APD) is a type of abnormal perceptual processing of auditory information within the central auditory nervous system that could be influenced by cognitive factors, such as attention. Attention is one of most important cognitive functions in the development of learning in children, so it is important to recognize and evaluate a variety of attention defi...
متن کاملA Comparison of Some Schemes for Translating Logic to C
The general improvement of C compilers, and some new non standard features of gcc have made it more attractive to compile (logic) to C: it is no longer unthinkable that the speed of a native code optimizer can be matched and even beaten by a scheme that compiles to C and lets most of the hard work be done by the C compiler. The new features, especially gcc's treatment of labels as first class t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1709.07642 شماره
صفحات -
تاریخ انتشار 2017